In this section, we are trying to answer the question: Retail needs to worry about who has money to spend - what has changed about who is working and earning money?
library(ipumsr)
library(vtable)
## Loading required package: kableExtra
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.1.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::group_rows() masks kableExtra::group_rows()
## x dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(jtools)
library(haven)
library(rdrobust)
ddi <- read_ipums_ddi("cps_00002.xml")
data <- read_ipums_micro(ddi)
## Use of data from IPUMS CPS is subject to conditions including that users should
## cite the data appropriately. Use command `ipums_conditions()` for more details.
#Join our data set by the provided industry names. Remove any blanks in our data set.
ind_df <- read_csv('indnames.csv')
## Rows: 8185 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): indname
## dbl (1): ind
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ind_df <- rename(ind_df, IND = ind)
new_df <- left_join(data, ind_df, by = "IND")
df <- new_df %>%
filter(!is.na(indname))
In our EARNWEEK column, 9999.99 is used instead of N/A or blanks in our data for people with unreported data. We must first clean our data to where these 9999.99 values doesn’t affect our models.
df <- df %>%
mutate(EARNWEEK = ifelse(EARNWEEK > 9999, NA_real_, EARNWEEK))
Next we must select only the columns that apply to our question. We decided to choose YEAR, MONTH, EMPSTAT, OCC, CLASSWKR, WKSTAT, EARNWEEK, and indname as our variables.
earnweekdf <- df %>%
select(c(YEAR, MONTH, EMPSTAT, OCC, CLASSWKR, WKSTAT, EARNWEEK, indname))
We want to have our dates in “year-month-day” format instead of three separate columns. We want to combine these columns into the “year-month-day” format for analysis later.
earnweekdf$Date <- paste(earnweekdf$MONTH, earnweekdf$YEAR)
earnweekdf$Date <- my(earnweekdf$Date)
earnweekdf <- earnweekdf %>%
select(-c(YEAR, MONTH))
Since we’re measuring the effect of COVID on our the rest of the retail sector as well as the economy, we must add a cutoff date to mark the start of the recession. 2020/01/31 marks as our cutoff date for the start of the recession. Any date after 2020/01/31 will be categorized as “PostCOVID”. This was around the period COVID was labeled a global pandemic.
earnweekdf <- earnweekdf %>%
mutate(PostCOVID = Date > ymd(20200131))
To make our testing data we want to match the industry names with their “PostCOVID” results and their results after the recession date respectively. We then average the earning of each industry with their appropriate results (before COVID date cutoff and after the COVID date cutoff).
effectonearnings <- earnweekdf %>%
group_by(indname, PostCOVID) %>%
summarize(avgearn = mean(EARNWEEK, na.rm = TRUE))
## `summarise()` has grouped output by 'indname'. You can override using the `.groups` argument.
We wanted to take a quick look before our regressions. Graphing our results showing the average earning in each industry will help in developing an understanding of our data. We decided to use two different graphs to represent pre and post COVID cutoff. As we can see, there is a slight decrease in the economy from pre COVID to post COVID for each industry.
Our code below shows, removed any missing values in the data set, filtering in the pre COVID results, finding the mean of our pre-COVID earning average, filtering in the post COVID results, and finding the mean of our post COVID earning average.
TRUE shows the average earnings post cutoff COVID date. FALSE shows the average earnings pre cutoff COVID date.
effectonearnings <- effectonearnings %>%
drop_na()
PostCOVIDearnavg <- effectonearnings %>%
filter(PostCOVID == TRUE)
PostCOVIDearnavg <- mean(PostCOVIDearnavg$avgearn)
postcovidearnavg <- effectonearnings %>%
filter(PostCOVID == FALSE)
postcovidearnavg <- mean(postcovidearnavg$avgearn)
ggplot(data = effectonearnings, aes(x = PostCOVID, y = avgearn, fill = indname)) + geom_bar(stat = "identity", position = position_dodge()) + theme(legend.title = element_text(size = 5), legend.text = element_text(size = 5)) + guides(color = guide_legend(override.aes = list(size = 0.5))) + guides(shape = guide_legend(override.aes = list(size = 0.5))) + labs(x = 'Before or After Covid', y = 'Average Earnings ($)', fill = 'Industry') + scale_x_discrete(labels=c("FALSE" = "Before", "TRUE" = "After"))
Looking at the data above, the industry leaders for average earnings are finance, information, manufacturing, and other services and public administration remained at the top for the highest averages both pre COVID and post COVID. Retail was negatively affected by COVID. Most of the industries had a slight decrease in average earnings.
Right off the bat, we can see that some industries had very little change in their earnings and some had a greater decrease. We’ll be able to see what these differences are in later analysis. However, from a quick look, the change in the average weekly-earnings for each grouped industry before and after COVID does not seem to be significant.
We’re testing to see the difference in earning per week before and after the COVID cutoff date. Our group decided to find the regression for earning per week for each industry in respect to before and after the cutoff COVID date (2020/01/31). The first graph shows the industry earning per week in respect to the year (macro, long-term). The second graph shows a zoomed in focus on earnings per week closer towards the cutoff COVID date (micro, immediate affect).
TRUE (teal-colored-line) shows the earning per week pre cutoff COVID date. FALSE (red-colored-line) shows the earning per week post cutoff COVID date.
We will really only be doing analysis for the retail industry and other industries that have a statistically significant effect.
Initially, the retail industry took a loss in earning per week due to COVID. Later, the retail industry was able to bring itself back and recovered from the initial loss in later years.
earnweekdf1 <- earnweekdf %>%
filter(indname == 'Retail Trade')
m1 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf1)
export_summs(m1)
| Model 1 | |
|---|---|
| (Intercept) | -576.28 *** |
| (87.44) | |
| Date | 0.07 *** |
| (0.00) | |
| PostCOVIDTRUE | -681.13 * |
| (309.74) | |
| Date:PostCOVIDTRUE | 0.04 * |
| (0.02) | |
| N | 109177 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
According to the regression for retail, we can interpret this by first looking at the intercept. Since we have date on the x-axis, when date is 0, we have a negative intercept. This is because the date doesn’t start at 0. There’s a coefficient of 0.11 on the Date variable for retail, meaning as we go day-to-day, weekly earnings increase by 0.11. The PostCOVIDTRUE coefficient is regarding to, controlling for date, the effect of COVID on earnings. Since controlling for date means the effect when date is 0, this coefficient is not very useful to us. The important last variable is the interaction term between date and COVID. We can see that when we are in the after-COVID time, earnings increases by 0.02 more than 0.11. So, the estimated associated effect of COVID on earnings after our cutoff is 0.13.
However, this interaction is not statistically significant, so this association of COVID on earnings is not one that we can say is strong enough to reject the null hypothesis that there was no effect on earnings for this industry.
earnweekdf1 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 407726 rows containing non-finite values (stat_smooth).
## Warning: Removed 407726 rows containing missing values (geom_point).
This graph shoes us this similar relationship. The line is the average of the values on each day. For the retail, the effect is very minimal, if existent at all (considering the results were not statistically significant)
earnweekdf1 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 499606 rows containing non-finite values (stat_smooth).
## Warning: Removed 499606 rows containing missing values (geom_point).
By taking a closer look, we can see that the average retail employee’s weekly earnings did not really change much when COVID was declared a national pandemic. So, as we go through the rest of the industries, we will only be analyzing ones that have statistically significant results.
Professional, Scientific, and Management, and Administrative and Waste Management Services benefited since the COVID cutoff date. An increase in earnings per week was shown immediately and continues to increase in the future.
earnweekdf2 <- earnweekdf %>%
filter(indname == 'Professional, Scientific, and Management, and Administrative and Waste Management Services')
m2 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf2)
export_summs(m2)
| Model 1 | |
|---|---|
| (Intercept) | -885.44 *** |
| (120.40) | |
| Date | 0.12 *** |
| (0.01) | |
| PostCOVIDTRUE | 975.87 * |
| (413.10) | |
| Date:PostCOVIDTRUE | -0.05 * |
| (0.02) | |
| N | 106383 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
In this model, we can see a statistically significant effect of COVID on earnings for the Professional, Scientific, Management, and Administrative and Waste Management Services industry. Our interaction shows a 0.09 decrease in the slope after COVID, so although earnings increased over the course of a few years, it increased at a much slower rate for this industry.
earnweekdf2 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 462986 rows containing non-finite values (stat_smooth).
## Warning: Removed 462986 rows containing missing values (geom_point).
earnweekdf2 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 551509 rows containing non-finite values (stat_smooth).
## Warning: Removed 551509 rows containing missing values (geom_point).
earnweekdf3 <- earnweekdf %>%
filter(indname == 'Finance and Insurance, and Real Estate and Rental and Leasing')
m3 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf3)
export_summs(m3)
| Model 1 | |
|---|---|
| (Intercept) | -833.08 *** |
| (147.41) | |
| Date | 0.12 *** |
| (0.01) | |
| PostCOVIDTRUE | -284.26 |
| (510.28) | |
| Date:PostCOVIDTRUE | 0.02 |
| (0.03) | |
| N | 65636 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
Here we have basically no effect of COVID on employee weekly earnings. However, this isn’t statistically significant, so there may have been one, but the model did not see one.
earnweekdf3 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 251720 rows containing non-finite values (stat_smooth).
## Warning: Removed 251720 rows containing missing values (geom_point).
earnweekdf3 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 306294 rows containing non-finite values (stat_smooth).
## Warning: Removed 306294 rows containing missing values (geom_point).
earnweekdf5 <- earnweekdf %>%
filter(indname == 'Arts, Entertainment, and Recreation, and Accommodation and Food Services')
m5 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf5)
export_summs(m5)
| Model 1 | |
|---|---|
| (Intercept) | -603.63 *** |
| (78.83) | |
| Date | 0.07 *** |
| (0.00) | |
| PostCOVIDTRUE | -86.53 |
| (295.43) | |
| Date:PostCOVIDTRUE | 0.00 |
| (0.02) | |
| N | 88783 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
earnweekdf5 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 355288 rows containing non-finite values (stat_smooth).
## Warning: Removed 355288 rows containing missing values (geom_point).
earnweekdf5 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 430686 rows containing non-finite values (stat_smooth).
## Warning: Removed 430686 rows containing missing values (geom_point).
earnweekdf6 <- earnweekdf %>%
filter(indname == 'Public Administration')
m6 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf6)
export_summs(m6)
| Model 1 | |
|---|---|
| (Intercept) | -240.35 |
| (141.06) | |
| Date | 0.08 *** |
| (0.01) | |
| PostCOVIDTRUE | -279.26 |
| (478.43) | |
| Date:PostCOVIDTRUE | 0.02 |
| (0.03) | |
| N | 58366 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
earnweekdf6 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 186103 rows containing non-finite values (stat_smooth).
## Warning: Removed 186103 rows containing missing values (geom_point).
earnweekdf6 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 234591 rows containing non-finite values (stat_smooth).
## Warning: Removed 234591 rows containing missing values (geom_point).
earnweekdf7 <- earnweekdf %>%
filter(indname == 'Agriculture, Forestry, Fishing, and Hunting, and Mining')
m7 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf7)
export_summs(m7)
| Model 1 | |
|---|---|
| (Intercept) | -326.60 |
| (249.06) | |
| Date | 0.08 *** |
| (0.01) | |
| PostCOVIDTRUE | 1374.78 |
| (890.35) | |
| Date:PostCOVIDTRUE | -0.08 |
| (0.05) | |
| N | 18772 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
Here, we have another statistically significant result at p < 0.05, where COVID has a pretty substantial effect on earnings in the Agriculture, Forestry, Fishing, and Hunting, and Mining industries. COVID basically made the increase in weekly earnings across the years non-existent when it was declared a global pandemic, with earnings only increasing by 0.01 per day post-COVID.
earnweekdf7 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 112989 rows containing non-finite values (stat_smooth).
## Warning: Removed 112989 rows containing missing values (geom_point).
We can see the graph here showing this almost constant weekly earnings after COVID hit.
earnweekdf7 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 128673 rows containing non-finite values (stat_smooth).
## Warning: Removed 128673 rows containing missing values (geom_point).
Transportation and Warehousing, and Utilities Industry benefited since the COVID cutoff date. An increase in earnings per week was shown immediately and continues to increase in the future.
earnweekdf8 <- earnweekdf %>%
filter(indname == 'Transportation and Warehousing, and Utilities')
m8 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf8)
export_summs(m8)
| Model 1 | |
|---|---|
| (Intercept) | 80.87 |
| (136.28) | |
| Date | 0.05 *** |
| (0.01) | |
| PostCOVIDTRUE | 782.83 |
| (459.93) | |
| Date:PostCOVIDTRUE | -0.04 |
| (0.02) | |
| N | 54654 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
earnweekdf8 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 204863 rows containing non-finite values (stat_smooth).
## Warning: Removed 204863 rows containing missing values (geom_point).
earnweekdf8 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 250417 rows containing non-finite values (stat_smooth).
## Warning: Removed 250417 rows containing missing values (geom_point).
earnweekdf9 <- earnweekdf %>%
filter(indname == 'Construction')
m9 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf9)
export_summs(m9)
| Model 1 | |
|---|---|
| (Intercept) | -624.34 *** |
| (123.89) | |
| Date | 0.09 *** |
| (0.01) | |
| PostCOVIDTRUE | 1024.23 * |
| (429.36) | |
| Date:PostCOVIDTRUE | -0.06 * |
| (0.02) | |
| N | 58953 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
earnweekdf9 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 285415 rows containing non-finite values (stat_smooth).
## Warning: Removed 285415 rows containing missing values (geom_point).
earnweekdf9 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 334601 rows containing non-finite values (stat_smooth).
## Warning: Removed 334601 rows containing missing values (geom_point).
Manufacturing Industry benefited since the COVID cutoff date. An increase in earnings per week was shown immediately and continues to increase in the future.
earnweekdf10 <- earnweekdf %>%
filter(indname == 'Manufacturing')
m10 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf10)
export_summs(m10)
| Model 1 | |
|---|---|
| (Intercept) | -387.45 *** |
| (104.00) | |
| Date | 0.08 *** |
| (0.01) | |
| PostCOVIDTRUE | 587.38 |
| (367.76) | |
| Date:PostCOVIDTRUE | -0.03 |
| (0.02) | |
| N | 103661 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
earnweekdf10 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 359500 rows containing non-finite values (stat_smooth).
## Warning: Removed 359500 rows containing missing values (geom_point).
earnweekdf10 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 446431 rows containing non-finite values (stat_smooth).
## Warning: Removed 446431 rows containing missing values (geom_point).
Initially, Other Services, Except Public Administration Industry had a decease in earnings per week. A few months after the COVID cutoff, Other Services, Except Public Administration Industry were able to stabilize. In later years, Other Services, Except Public Administration Industry were able to have a growth in earnings per week.
earnweekdf10 <- earnweekdf %>%
filter(indname == 'Other Services, Except Public Administration')
m10 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf10)
export_summs(m10)
| Model 1 | |
|---|---|
| (Intercept) | -635.66 *** |
| (143.28) | |
| Date | 0.08 *** |
| (0.01) | |
| PostCOVIDTRUE | -70.22 |
| (510.03) | |
| Date:PostCOVIDTRUE | 0.01 |
| (0.03) | |
| N | 43190 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
earnweekdf10 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 190987 rows containing non-finite values (stat_smooth).
## Warning: Removed 190987 rows containing missing values (geom_point).
earnweekdf10 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 227255 rows containing non-finite values (stat_smooth).
## Warning: Removed 227255 rows containing missing values (geom_point).
The Wholesale Trade Industry didn’t have an initial affect after the COVID cutoff date. After a few months, there was a slight and steady increase in earning per week.
earnweekdf11 <- earnweekdf %>%
filter(indname == 'Wholesale Trade')
m11 <- lm(EARNWEEK ~ Date * PostCOVID, data = earnweekdf11)
export_summs(m11)
| Model 1 | |
|---|---|
| (Intercept) | -564.78 * |
| (222.39) | |
| Date | 0.09 *** |
| (0.01) | |
| PostCOVIDTRUE | 863.75 |
| (788.25) | |
| Date:PostCOVIDTRUE | -0.05 |
| (0.04) | |
| N | 23133 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
earnweekdf11 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 84994 rows containing non-finite values (stat_smooth).
## Warning: Removed 84994 rows containing missing values (geom_point).
earnweekdf11 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EARNWEEK, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,3500) + geom_hline(yintercept = 0) + xlim(mdy(09012019),mdy(09012020)) + labs(title = 'COVID Effect on Earnings over Time', y = 'Earnings', color = 'PostCOVID?')
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 104421 rows containing non-finite values (stat_smooth).
## Warning: Removed 104421 rows containing missing values (geom_point).
According to the models, there weren’t many industries that had a statistically significant effect of COVID on the industry employee’s weekly earnings. We can see that, for most, as time when on, earnings increased, but COVID introduced little to no impact on their earnings throughout the duration of COVID. The industry with a notable effect was the Agriculture, Forestry, Fishing, and Hunting, and Mining industry, which found almost constant weekly earnings relative to date throughout COVID. The Professional, Scientific, and Management, and Administrative and Waste Management Services industry also had a statistically significant effect, which shows that these industries may have had to introduce pay cuts after COVID was declared a global pandemic.
As far as the other industries go, there weren’t effects that were statistically significant, so we can’t really draw conclusions from them, but on average, most industries saw a decline in their earnings after COVID was announced a global pandemic.
We’re testing to see the difference in employment before and after the COVID cutoff date. Similar to the earning regression graphs, our group decided to find the regression for employment for each industry in respect to before and after the cutoff COVID date (2020/01/31). The first graph shows the industry employment in respect to the year (macro, long-term). The second graph shows a zoomed in focus on earnings per week closer towards the cutoff COVID date (micro, immediate affect).
TRUE (teal-colored-line) shows the employment post cutoff COVID date. FALSE (red-colored-line) shows the employment pre cutoff COVID date.
The code below filters out data and employees who are not in the labor force. We then assign employed employees as 1 and unemployed employees as 0.
empstatdf <- earnweekdf %>%
filter(EMPSTAT > 09) %>%
filter(EMPSTAT < 23)
# Filters out not in labor force, includes employed and unemployed
empstatdf <- empstatdf %>%
mutate(EMPSTAT = replace(EMPSTAT, EMPSTAT == 10, 1)) %>%
mutate(EMPSTAT = replace(EMPSTAT, EMPSTAT == 12, 1))
empstatdf <- empstatdf %>%
mutate(EMPSTAT = replace(EMPSTAT, EMPSTAT == 20, 0)) %>%
mutate(EMPSTAT = replace(EMPSTAT, EMPSTAT == 21, 0))
# Employed = 1, Unemployed = 0
By filtering out those not in the labor force and assigning a binary variable to those employed and unemployed, we can see how, on average, employment levels changed throughout the Pre-COVID and Post-COVID periods.
empstatdf1 <- empstatdf %>%
filter(indname == 'Retail Trade')
m12 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf1)
export_summs(m12)
| Model 1 | |
|---|---|
| (Intercept) | 0.81 *** |
| (0.02) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.99 *** |
| (0.05) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 509606 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
Interpreting the coefficients here, the Intercept lets us know where employment levels would be controlling for date and COVID, so in the beginning of our time-frame, we have about 70% of people employed. We can see that as soon as we jump over the cutoff, however, there’s a decrease in employment, meaning controlling for the date’s effect on employment, COVID is associated with a 0.89 decrease in employment, which technically means that Post-COVID people in this industry saw sharp unemployment.
The rest of the industries have varying results, some being very stable and some being pretty drastic dips in employment when COVID is declared a pandemic. (see arts industry)
empstatdf1 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf1 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 429663 rows containing non-finite values (stat_smooth).
## Warning: Removed 429663 rows containing missing values (geom_point).
empstatdf2 <- empstatdf %>%
filter(indname == 'Professional, Scientific, and Management, and Administrative and Waste Management Services')
m13 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf2)
export_summs(m13)
| Model 1 | |
|---|---|
| (Intercept) | 0.73 *** |
| (0.01) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.40 *** |
| (0.05) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 563640 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf2 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf2 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 471422 rows containing non-finite values (stat_smooth).
## Warning: Removed 471422 rows containing missing values (geom_point).
empstatdf3 <- empstatdf %>%
filter(indname == 'Finance and Insurance, and Real Estate and Rental and Leasing')
m14 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf3)
export_summs(m14)
| Model 1 | |
|---|---|
| (Intercept) | 0.88 *** |
| (0.01) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.28 *** |
| (0.05) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 314988 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf3 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + geom_hline(yintercept = 0)
## `geom_smooth()` using formula 'y ~ x'
empstatdf3 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 263011 rows containing non-finite values (stat_smooth).
## Warning: Removed 263011 rows containing missing values (geom_point).
empstatdf5 <- empstatdf %>%
filter(indname == 'Arts, Entertainment, and Recreation, and Accommodation and Food Services')
m16 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf5)
export_summs(m16)
| Model 1 | |
|---|---|
| (Intercept) | 0.70 *** |
| (0.02) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -3.17 *** |
| (0.07) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 434699 |
| R2 | 0.02 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf5 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf5 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 366911 rows containing non-finite values (stat_smooth).
## Warning: Removed 366911 rows containing missing values (geom_point).
The interesting thing about this industry is in the short term, there’s a decrease in employment, but long-term, the industry recovers, as most of the industries do. This is one of the benefits of shortening the window we’re looking at.
empstatdf6 <- empstatdf %>%
filter(indname == 'Public Administration')
m17 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf6)
export_summs(m17)
| Model 1 | |
|---|---|
| (Intercept) | 0.94 *** |
| (0.01) | |
| Date | 0.00 ** |
| (0.00) | |
| PostCOVIDTRUE | -0.19 *** |
| (0.05) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 241960 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf6 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf6 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 201684 rows containing non-finite values (stat_smooth).
## Warning: Removed 201684 rows containing missing values (geom_point).
empstatdf7 <- empstatdf %>%
filter(indname == 'Agriculture, Forestry, Fishing, and Hunting, and Mining')
m18 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf7)
export_summs(m18)
| Model 1 | |
|---|---|
| (Intercept) | 0.63 *** |
| (0.03) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.04 |
| (0.11) | |
| Date:PostCOVIDTRUE | 0.00 |
| (0.00) | |
| N | 129956 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf7 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf7 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 109163 rows containing non-finite values (stat_smooth).
## Warning: Removed 109163 rows containing missing values (geom_point).
empstatdf8 <- empstatdf %>%
filter(indname == 'Transportation and Warehousing, and Utilities')
m19 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf8)
export_summs(m19)
| Model 1 | |
|---|---|
| (Intercept) | 0.82 *** |
| (0.02) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.89 *** |
| (0.07) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 256695 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf8 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf8 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 213985 rows containing non-finite values (stat_smooth).
## Warning: Removed 213985 rows containing missing values (geom_point).
empstatdf9 <- empstatdf %>%
filter(indname == 'Construction')
m20 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf9)
export_summs(m20)
| Model 1 | |
|---|---|
| (Intercept) | 0.55 *** |
| (0.02) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.63 *** |
| (0.07) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 341064 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf9 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf9 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 285575 rows containing non-finite values (stat_smooth).
## Warning: Removed 285575 rows containing missing values (geom_point).
empstatdf10 <- empstatdf %>%
filter(indname == 'Manufacturing')
m21 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf10)
export_summs(m21)
| Model 1 | |
|---|---|
| (Intercept) | 0.78 *** |
| (0.01) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.90 *** |
| (0.05) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 459225 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf10 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf10 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 385972 rows containing non-finite values (stat_smooth).
## Warning: Removed 385972 rows containing missing values (geom_point).
empstatdf11 <- empstatdf %>%
filter(indname == 'Other Services, Except Public Administration')
m22 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf11)
export_summs(m22)
| Model 1 | |
|---|---|
| (Intercept) | 0.77 *** |
| (0.02) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -1.40 *** |
| (0.07) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 231158 |
| R2 | 0.01 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf11 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf11 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 194267 rows containing non-finite values (stat_smooth).
## Warning: Removed 194267 rows containing missing values (geom_point).
empstatdf12 <- empstatdf %>%
filter(indname == 'Wholesale Trade')
m23 <- lm(EMPSTAT ~ Date * PostCOVID, data = empstatdf12)
export_summs(m23)
| Model 1 | |
|---|---|
| (Intercept) | 0.83 *** |
| (0.03) | |
| Date | 0.00 *** |
| (0.00) | |
| PostCOVIDTRUE | -0.42 *** |
| (0.10) | |
| Date:PostCOVIDTRUE | 0.00 *** |
| (0.00) | |
| N | 107289 |
| R2 | 0.00 |
| *** p < 0.001; ** p < 0.01; * p < 0.05. | |
empstatdf12 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID')
## `geom_smooth()` using formula 'y ~ x'
empstatdf12 %>% mutate(date_group = as.factor(PostCOVID)) %>% ggplot(aes(x = Date, y = EMPSTAT, color = date_group)) + geom_point(size = 0.5) + geom_vline(xintercept = mdy(01152020)) + geom_smooth(method = "lm", se = FALSE) + ylim(0,1) + geom_hline(yintercept = 0) + labs(title = 'COVID Effect on Employment Status over Time', y = 'Employed?', color = 'PostCOVID') + xlim(mdy(09012019),mdy(09012020))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 90443 rows containing non-finite values (stat_smooth).
## Warning: Removed 90443 rows containing missing values (geom_point).
Our observation concludes that every industry experienced a similar trend in terms of unemployment. This was an initial loss in employment due to COVID, and then employment increased slowly around 2021. While some industries had pretty constant employment and some had a sharp cut, the earnings and employment variables across industries didn’t necessarily see a similar trend, for example, industries with a decrease in earnings did not always have a decrease in employment.
In conclusion, depending on when retail wanted to know the change, we can conclude that initially after the COVID date cutoff, manufacturing, construction, transportation, educational, and professional industries weren’t really affected negatively for earnings while every other industry was negative affected. Around a year after the COVID date cutoff, the earning per week returned back to Pre COVID results, except for the agriculture industry which remained stagnant. In 2022, the earning per week returned back to Pre COVID results if not greater except for the agriculture industry. This means the original industry leaders (finance, information, manufacturing, and other services and public administration) are still on the top for average earnings in 2022.
In terms of employment, every industry experienced the same trend in terms of unemployment. An initial loss in employment due to COVID. Then started to increase employment slowly around 2021. To eventually stabilize to prior employment back to Pre COVID rates in 2022. One thing to note is the arts industry had the biggest initial lost in employment at the start of the cutoff COVID date.
Some things to note about our data and our models:
There were a ton of observations on the first of each month, which makes it difficult to clearly see the effect of COVID, especially when it was declared a national pandemic. A lot can change month-to-month with lock-downs varying state-to-state and everywhere being on different timelines.
Also, our data does not take into account inflation from the years 2019 to now. This was not controlled for or adjusted for, so the coefficient on date for earnings may include inflation. This goes for raises year-to-year as well if applicable.
A lot of the models weren’t statistically significant when looking at weekly employee earnings. So, we cannot reject the null hypothesis that there was an effect on earnings from COVID for all industries.